Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 37(21): 3788-3795, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34213536

RESUMO

MOTIVATION: The negative binomial distribution has been shown to be a good model for counts data from both bulk and single-cell RNA-sequencing (RNA-seq). Gaussian process (GP) regression provides a useful non-parametric approach for modelling temporal or spatial changes in gene expression. However, currently available GP regression methods that implement negative binomial likelihood models do not scale to the increasingly large datasets being produced by single-cell and spatial transcriptomics. RESULTS: The GPcounts package implements GP regression methods for modelling counts data using a negative binomial likelihood function. Computational efficiency is achieved through the use of variational Bayesian inference. The GP function models changes in the mean of the negative binomial likelihood through a logarithmic link function and the dispersion parameter is fitted by maximum likelihood. We validate the method on simulated time course data, showing better performance to identify changes in over-dispersed counts data than methods based on Gaussian or Poisson likelihoods. To demonstrate temporal inference, we apply GPcounts to single-cell RNA-seq datasets after pseudotime and branching inference. To demonstrate spatial inference, we apply GPcounts to data from the mouse olfactory bulb to identify spatially variable genes and compare to two published GP methods. We also provide the option of modelling additional dropout using a zero-inflated negative binomial. Our results show that GPcounts can be used to model temporal and spatial counts data in cases where simpler Gaussian and Poisson likelihoods are unrealistic. AVAILABILITY AND IMPLEMENTATION: GPcounts is implemented using the GPflow library in Python and is available at https://github.com/ManchesterBioinference/GPcounts along with the data, code and notebooks required to reproduce the results presented here. The version used for this paper is archived at https://doi.org/10.5281/zenodo.5027066. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Modelos Estatísticos , Animais , Camundongos , Teorema de Bayes , RNA-Seq , Análise de Sequência de RNA/métodos
2.
Elife ; 92020 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-33274713

RESUMO

High-throughput testing of drugs across molecular-characterised cell lines can identify candidate treatments and discover biomarkers. However, the cells' response to a drug is typically quantified by a summary statistic from a best-fit dose-response curve, whilst neglecting the uncertainty of the curve fit and the potential variability in the raw readouts. Here, we model the experimental variance using Gaussian Processes, and subsequently, leverage uncertainty estimates to identify associated biomarkers with a new Bayesian framework. Applied to in vitro screening data on 265 compounds across 1074 cancer cell lines, our models identified 24 clinically established drug-response biomarkers, and provided evidence for six novel biomarkers by accounting for association with low uncertainty. We validated our uncertainty estimates with an additional drug screen of 26 drugs, 10 cell lines with 8 to 9 replicates. Our method is applicable to any dose-response data without replicates, and improves biomarker discovery for precision medicine.


Assuntos
Antineoplásicos , Biomarcadores Tumorais/análise , Descoberta de Drogas/métodos , Descoberta de Drogas/normas , Estatística como Assunto/métodos , Linhagem Celular Tumoral , Ensaios de Triagem em Larga Escala/métodos , Ensaios de Triagem em Larga Escala/normas , Humanos
3.
Arthritis Res Ther ; 21(1): 47, 2019 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-30728072

RESUMO

OBJECTIVE: We applied systems biology approaches to investigate circadian rhythmicity in rheumatoid arthritis (RA). METHODS: We recruited adults (age 16-80 years old) with a clinical diagnosis of RA (active disease [DAS28 > 3.2]). Sleep profiles were determined before inpatient measurements of saliva, serum, and peripheral blood mononuclear leukocytes (PBML). Transcriptome and proteome analyses were carried out by RNA-SEQ and LC-MS/MS. Serum samples were analysed by targeted lipidomics, along with serum from mouse collagen induced-arthritis (CIA). Bioinformatic analysis identified RA-specific gene networks and rhythmic processes differing between healthy and RA. RESULTS: RA caused greater time-of-day variation in PBML gene expression, and ex vivo stimulation identified a time-of-day-specific RA transcriptome. We found increased phospho-STAT3 in RA patients, and some targets, including phospho-ATF2, acquired time-of-day variation in RA. Serum ceramides also gained circadian rhythmicity in RA, which was also seen in mouse experimental arthritis, resulting from gain in circadian rhythmicity of hepatic ceramide synthases. CONCLUSION: RA drives a gain in circadian rhythmicity, both in immune cells, and systemically. The coupling of distant timing information to ceramide synthesis and joint inflammation points to a systemic re-wiring of the circadian repertoire. Circadian reprogramming in response to chronic inflammation has implications for inflammatory co-morbidities and time-of-day therapeutics.


Assuntos
Artrite Experimental/genética , Artrite Reumatoide/genética , Ritmo Circadiano , Leucócitos Mononucleares/metabolismo , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Artrite Experimental/imunologia , Artrite Experimental/metabolismo , Artrite Reumatoide/imunologia , Artrite Reumatoide/metabolismo , Ceramidas/sangue , Feminino , Perfilação da Expressão Gênica/métodos , Humanos , Inflamação/genética , Inflamação/imunologia , Inflamação/metabolismo , Leucócitos Mononucleares/imunologia , Masculino , Camundongos Endogâmicos DBA , Pessoa de Meia-Idade , Proteômica/métodos , Adulto Jovem
4.
Genome Biol ; 19(1): 65, 2018 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-29843817

RESUMO

High-throughput single-cell gene expression experiments can be used to uncover branching dynamics in cell populations undergoing differentiation through pseudotime methods. We develop the branching Gaussian process (BGP), a non-parametric model that is able to identify branching dynamics for individual genes and provide an estimate of branching times for each gene with an associated credible region. We demonstrate the effectiveness of our method on simulated data, a single-cell RNA-seq haematopoiesis study and mouse embryonic stem cells generated using droplet barcoding. The method is robust to high levels of technical variation and dropout, which are common in single-cell data.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Animais , Células-Tronco Embrionárias/metabolismo , Hematopoese/genética , Camundongos , Distribuição Normal , Análise de Célula Única
5.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1948-1963, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-28841550

RESUMO

Missing data and noisy observations pose significant challenges for reliably predicting events from irregularly sampled multivariate time series (longitudinal) data. Imputation methods, which are typically used for completing the data prior to event prediction, lack a principled mechanism to account for the uncertainty due to missingness. Alternatively, state-of-the-art joint modeling techniques can be used for jointly modeling the longitudinal and event data and compute event probabilities conditioned on the longitudinal observations. These approaches, however, make strong parametric assumptions and do not easily scale to multivariate signals with many observations. Our proposed approach consists of several key innovations. First, we develop a flexible and scalable joint model based upon sparse multiple-output Gaussian processes. Unlike state-of-the-art joint models, the proposed model can explain highly challenging structure including non-Gaussian noise while scaling to large data. Second, we derive an optimal policy for predicting events using the distribution of the event occurrence estimated by the joint model. The derived policy trades-off the cost of a delayed detection versus incorrect assessments and abstains from making decisions when the estimated event probability does not satisfy the derived confidence criteria. Experiments on a large dataset show that the proposed framework significantly outperforms state-of-the-art techniques in event prediction.

6.
IEEE Trans Pattern Anal Mach Intell ; 37(2): 383-93, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26353249

RESUMO

In this publication, we combine two Bayesian nonparametric models: the Gaussian Process (GP) and the Dirichlet Process (DP). Our innovation in the GP model is to introduce a variation on the GP prior which enables us to model structured time-series data, i.e., data containing groups where we wish to model inter- and intra-group variability. Our innovation in the DP model is an implementation of a new fast collapsed variational inference procedure which enables us to optimize our variational approximation significantly faster than standard VB approaches. In a biological time series application we show how our model better captures salient features of the data, leading to better consistency with existing biological classifications, while the associated inference algorithm provides a significant speed-up over EM-based variational inference.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Distribuição Normal , Simulação por Computador , Perfilação da Expressão Gênica , Estatísticas não Paramétricas
7.
Bioinformatics ; 31(24): 3881-9, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26315907

RESUMO

MOTIVATION: Assigning RNA-seq reads to their transcript of origin is a fundamental task in transcript expression estimation. Where ambiguities in assignments exist due to transcripts sharing sequence, e.g. alternative isoforms or alleles, the problem can be solved through probabilistic inference. Bayesian methods have been shown to provide accurate transcript abundance estimates compared with competing methods. However, exact Bayesian inference is intractable and approximate methods such as Markov chain Monte Carlo and Variational Bayes (VB) are typically used. While providing a high degree of accuracy and modelling flexibility, standard implementations can be prohibitively slow for large datasets and complex transcriptome annotations. RESULTS: We propose a novel approximate inference scheme based on VB and apply it to an existing model of transcript expression inference from RNA-seq data. Recent advances in VB algorithmics are used to improve the convergence of the algorithm beyond the standard Variational Bayes Expectation Maximization algorithm. We apply our algorithm to simulated and biological datasets, demonstrating a significant increase in speed with only very small loss in accuracy of expression level estimation. We carry out a comparative study against seven popular alternative methods and demonstrate that our new algorithm provides excellent accuracy and inter-replicate consistency while remaining competitive in computation time. AVAILABILITY AND IMPLEMENTATION: The methods were implemented in R and C++, and are available as part of the BitSeq project at github.com/BitSeq. The method is also available through the BitSeq Bioconductor package. The source code to reproduce all simulation results can be accessed via github.com/BitSeq/BitSeqVB_benchmarking.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Teorema de Bayes , Humanos , Cadeias de Markov , Método de Monte Carlo
8.
Dev Cell ; 32(3): 265-77, 2015 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-25640223

RESUMO

Hox transcription factors (TFs) are essential for vertebrate development, but how these evolutionary conserved proteins function in vivo remains unclear. Because Hox proteins have notoriously low binding specificity, they are believed to bind with cofactors, mainly homeodomain TFs Pbx and Meis, to select their specific targets. We mapped binding of Meis, Pbx, and Hoxa2 in the branchial arches, a series of segments in the developing vertebrate head. Meis occupancy is largely similar in Hox-positive and -negative arches. Hoxa2, which specifies second arch (IIBA) identity, recognizes a subset of Meis prebound sites that contain Hox motifs. Importantly, at these sites Meis binding is strongly increased. This enhanced Meis binding coincides with active enhancers, which are linked to genes highly expressed in the IIBA and regulated by Hoxa2. These findings show that Hoxa2 operates as a tissue-specific cofactor, enhancing Meis binding to specific sites that provide the IIBA with its anatomical identity.


Assuntos
Região Branquial/metabolismo , Regulação da Expressão Gênica no Desenvolvimento/fisiologia , Proteínas de Homeodomínio/metabolismo , Animais , Linhagem Celular , Camundongos , Proteína Meis1 , Proteínas de Neoplasias/metabolismo , Fatores de Transcrição/metabolismo
9.
Sci Rep ; 4: 5183, 2014 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-24897937

RESUMO

Tendons are prominent members of the family of fibrous connective tissues (FCTs), which collectively are the most abundant tissues in vertebrates and have crucial roles in transmitting mechanical force and linking organs. Tendon diseases are among the most common arthropathy disorders; thus knowledge of tendon gene regulation is essential for a complete understanding of FCT biology. Here we show autonomous circadian rhythms in mouse tendon and primary human tenocytes, controlled by an intrinsic molecular circadian clock. Time-series microarrays identified the first circadian transcriptome of murine tendon, revealing that 4.6% of the transcripts (745 genes) are expressed in a circadian manner. One of these genes was Grem2, which oscillated in antiphase to BMP signaling. Moreover, recombinant human Gremlin-2 blocked BMP2-induced phosphorylation of Smad1/5 and osteogenic differentiation of human tenocytes in vitro. We observed dampened Grem2 expression, deregulated BMP signaling, and spontaneously calcifying tendons in young CLOCKΔ19 arrhythmic mice and aged wild-type mice. Thus, disruption of circadian control, through mutations or aging, of Grem2/BMP signaling becomes a new focus for the study of calcific tendinopathy, which affects 1-in-5 people over the age of 50 years.


Assuntos
Proteína Morfogenética Óssea 2/metabolismo , Relógios Circadianos/fisiologia , Proteínas/metabolismo , Tendões/fisiologia , Animais , Western Blotting , Proteína Morfogenética Óssea 2/antagonistas & inibidores , Proteína Morfogenética Óssea 2/genética , Diferenciação Celular , Células Cultivadas , Citocinas , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Técnicas Imunoenzimáticas , Camundongos , Fosforilação , Proteínas/genética , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Transdução de Sinais , Tendões/citologia , Fatores de Tempo
10.
Stat Appl Genet Mol Biol ; 13(2): 203-16, 2014 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-24413218

RESUMO

RNA-seq studies allow for the quantification of transcript expression by aligning millions of short reads to a reference genome. However, transcripts share much of their sequence, so that many reads map to more than one place and their origin remains uncertain. This problem can be dealt using mixtures of distributions and transcript expression reduces to estimating the weights of the mixture. In this paper, variational Bayesian (VB) techniques are used in order to approximate the posterior distribution of transcript expression. VB has previously been shown to be more computationally efficient for this problem than Markov chain Monte Carlo. VB methodology can precisely estimate the posterior means, but leads to variance underestimation. For this reason, a novel approach is introduced which integrates the latent allocation variables out of the VB approximation. It is shown that this modification leads to a better marginal likelihood bound and improved estimate of the posterior variance. A set of simulation studies and application to real RNA-seq datasets highlight the improved performance of the proposed method.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA/métodos , Transcrição Gênica , Teorema de Bayes , Simulação por Computador , Expressão Gênica , Cadeias de Markov , Método de Monte Carlo
11.
BMC Bioinformatics ; 14: 252, 2013 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-23962281

RESUMO

BACKGROUND: Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. RESULTS: We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. CONCLUSION: The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Animais , Análise por Conglomerados , Drosophila/genética , Drosophila/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos/métodos
12.
Arthritis Rheum ; 65(9): 2334-45, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23896777

RESUMO

OBJECTIVE: To characterize the circadian clock in murine cartilage tissue and identify tissue-specific clock target genes, and to investigate whether the circadian clock changes during aging or during cartilage degeneration using an experimental mouse model of osteoarthritis (OA). METHODS: Cartilage explants were obtained from aged and young adult mice after transduction with the circadian clock fusion protein reporter PER2::luc, and real-time bioluminescence recordings were used to characterize the properties of the clock. Time-series microarrays were performed on mouse cartilage tissue to identify genes expressed in a circadian manner. Rhythmic genes were confirmed by quantitative reverse transcription-polymerase chain reaction using mouse tissue, primary chondrocytes, and a human chondrocyte cell line. Experimental OA was induced in mice by destabilization of the medial meniscus (DMM), and articular cartilage samples were microdissected and subjected to microarray analysis. RESULTS: Mouse cartilage tissue and a human chondrocyte cell line were found to contain intrinsic molecular circadian clocks. The cartilage clock could be reset by temperature signals, while the circadian period was temperature compensated. PER2::luc bioluminescence demonstrated that circadian oscillations were significantly lower in amplitude in cartilage from aged mice. Time-series microarray analyses of the mouse tissue identified the first circadian transcriptome in cartilage, revealing that 615 genes (∼3.9% of the expressed genes) displayed a circadian pattern of expression. This included genes involved in cartilage homeostasis and survival, as well as genes with potential importance in the pathogenesis of OA. Several clock genes were disrupted in the early stages of cartilage degeneration in the DMM mouse model of OA. CONCLUSION: These results reveal an autonomous circadian clock in chondrocytes that can be implicated in key aspects of cartilage biology and pathology. Consequently, circadian disruption (e.g., during aging) may compromise tissue homeostasis and increase susceptibility to joint damage or disease.


Assuntos
Cartilagem Articular/metabolismo , Condrócitos/metabolismo , Relógios Circadianos/fisiologia , Regulação da Expressão Gênica , Homeostase/genética , Animais , Artrite Experimental/genética , Artrite Experimental/metabolismo , Linhagem Celular , Humanos , Masculino , Camundongos , Osteoartrite/genética , Osteoartrite/metabolismo , Proteínas Circadianas Period/genética , Proteínas Circadianas Period/metabolismo
13.
Nucleic Acids Res ; 40(9): 3990-4001, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22223247

RESUMO

The regulation of gene expression is central to developmental programs and largely depends on the binding of sequence-specific transcription factors with cis-regulatory elements in the genome. Hox transcription factors specify the spatial coordinates of the body axis in all animals with bilateral symmetry, but a detailed knowledge of their molecular function in instructing cell fates is lacking. Here, we used chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) to identify Hoxa2 genomic locations in a time and space when it is actively instructing embryonic development in mouse. Our data reveals that Hoxa2 has large genome coverage and potentially regulates thousands of genes. Sequence analysis of Hoxa2-bound regions identifies high occurrence of two main classes of motifs, corresponding to Hox and Pbx-Hox recognition sequences. Examination of the binding targets of Hoxa2 faithfully captures the processes regulated by Hoxa2 during embryonic development; in addition, it uncovers a large cluster of potential targets involved in the Wnt-signaling pathway. In vivo examination of canonical Wnt-ß-catenin signaling reveals activity specifically in Hoxa2 domain of expression, and this is undetectable in Hoxa2 mutant embryos. The comprehensive mapping of Hoxa2-binding sites provides a framework to study Hox regulatory networks in vertebrate developmental processes.


Assuntos
Desenvolvimento Embrionário/genética , Proteínas de Homeodomínio/metabolismo , Via de Sinalização Wnt/genética , Animais , Sítios de Ligação , Região Branquial/metabolismo , Imunoprecipitação da Cromatina , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Proteínas de Homeodomínio/genética , Camundongos , Análise de Sequência de DNA , beta Catenina/metabolismo
14.
Gastrointest Endosc ; 74(5): 1033-9.e1-3; quiz 1115.e1-4, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22032317

RESUMO

BACKGROUND: Significant mortality after gastrostomy insertion remains and some risk factors have been identified, but no predictive scoring system exists. OBJECTIVE: To identify risk factors for mortality, formulate a predictive scoring system, and validate the score. Comparison to an artificial neural network (ANN). DESIGN: Endoscopic database analysis. SETTING: Six hospitals (2 teaching hospitals) in the South Yorkshire region, United Kingdom. PATIENTS: This study involved all patients referred for gastrostomy insertion. INTERVENTION: Generation of clinical scores to predict 30-day mortality in patients undergoing gastrostomy insertion. MAIN OUTCOME MEASUREMENTS: Risk factors for 30-day mortality. Internal and external validation of the score. Comparison with an ANN. RESULTS: Univariate analysis showed that 30-day mortality was associated with age, albumin levels, and cardiac and neurological comorbidities. Multivariate analysis showed that only age and albumin levels were independent. Modeling provided scores of 0, 1, 2, and 3 corresponding to 30-day mortalities of 0% (0-2.1), 7% (2.9-13.9), 21.3% (13.5-30.9), and 37.3% (24.1-51.9), respectively. Application of the scoring system at the other teaching hospital and the 4 district general hospitals gave 30-day mortality rates that were not significantly different from those predicted. Receiver operating characteristic curves for the score and the ANN were comparable. LIMITATIONS: Nonrandomized study. Score not used as a decision-making tool. CONCLUSION: The gastrostomy score provides an estimate of 30-day mortality for patients (and their relatives) when gastrostomy insertion is being discussed. This score requires evaluation as a decision-making tool in clinical practice. ANN analysis results were similar to the outcomes from the clinical score.


Assuntos
Técnicas de Apoio para a Decisão , Gastrostomia/mortalidade , Albumina Sérica , Fatores Etários , Idoso , Feminino , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Análise Multivariada , Redes Neurais de Computação , Curva ROC , Reino Unido
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...